Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add checkm2 #6542

Open
wants to merge 22 commits into
base: main
Choose a base branch
from
Open

add checkm2 #6542

wants to merge 22 commits into from

Conversation

astrovsky01
Copy link
Contributor

FOR CONTRIBUTOR:

  • I have read the CONTRIBUTING.md document and this tool is appropriate for the tools-iuc repo.
  • License permits unrestricted use (educational + commercial)
  • This PR adds a new tool or tool collection
  • This PR updates an existing tool or tool collection
  • This PR does something else (explain below)

tools/checkm2/.shed.yml Outdated Show resolved Hide resolved
tools/checkm2/checkm2.xml Outdated Show resolved Hide resolved
tools/checkm2/checkm2.xml Outdated Show resolved Hide resolved
tools/checkm2/checkm2.xml Outdated Show resolved Hide resolved
@astrovsky01 astrovsky01 marked this pull request as draft November 13, 2024 00:12
Copy link
Contributor

@bernt-matthias bernt-matthias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent timing: One of my users just asked for the tool :)

Could contribute a data manager.

tools/checkm2/checkm2.xml Outdated Show resolved Hide resolved
tools/checkm2/checkm2.xml Outdated Show resolved Hide resolved
tools/checkm2/checkm2.xml Outdated Show resolved Hide resolved
tools/checkm2/checkm2.xml Outdated Show resolved Hide resolved
tools/checkm2/checkm2.xml Outdated Show resolved Hide resolved
tools/checkm2/checkm2.xml Show resolved Hide resolved
tools/checkm2/checkm2.xml Outdated Show resolved Hide resolved
@astrovsky01 astrovsky01 marked this pull request as ready for review November 15, 2024 20:15
tools/checkm2/checkm2.xml Outdated Show resolved Hide resolved
tools/checkm2/tool-data/checkm2.loc.sample Outdated Show resolved Hide resolved
tools/checkm2/checkm2.xml Outdated Show resolved Hide resolved
tools/checkm2/checkm2.xml Outdated Show resolved Hide resolved
Copy link
Contributor

@bernt-matthias bernt-matthias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good from my side.

#The <version> column indicates the checkm2 version that generated the database

#
#diamond_db_1.0.2 Diamond database 1.0.2 /mnt/galaxyIndices/Checkm2_database/uniref100.KO.1.dmnd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really a diamond DB?
If so, this is interesting ... should we have a general Diamond location file and DM? with some tag for different tools?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. And I agree that it would be interesting.

But it would be good to know and store the diamond version that has been used to generate it, or? Seems difficult to find out from the sources. The tool just downloads the latest version from zenodo (and I could not even find the link). Let me check if diamond dbinfo could help.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice:

> diamond dbinfo -d uniref100.KO.1.dmnd

diamond v2.0.4.142 (C) Max Planck Society for the Advancement of Science
Documentation, support and updates available at http://www.diamondsearch.org

Database format version = 3
Diamond build = 142
Sequences = 6518230
Letters = 2584051404

Should we do this? Add columns tool, db_format_version, diamond_build?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need diamond_build? But yes, we should do that :)

Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@astrovsky01 do you think you can work on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some digging, I found that Checkm2 doesn't actually work with all diamond databases. It has an internal checksum to make sure it's the specific one from the database download command:

https://github.com/chklovski/CheckM2/blob/319dae65f1c7f2fc1c0bb160d90ac3ba64ed9457/checkm2/versionControl.py#L74

as such, I think that while it would be good to have the general Diamond db data manager, having a specific one for checkm2 is also a good idea

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's go ahead .. I would say.

My feeling is that a general data manager would be too complex and multiple data managers writing to the same data table also seems confusing. Maybe it's better to have tools load multiple data tables?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was playing around with the writing basically extra labels, but that requires someone on the other end parsing the table. Also, depending on the tool, you start to get additional bloat in the requirements. At the very least, checkm2's tool would require its conda package, and that's just extra dependencies for the other tools, even when you don't need it. I think it's a good idea conceptually, but maybe not for this case, specifically

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate? I do not understand what you want to say.

writing basically extra labels

What labels?

Also, depending on the tool, you start to get additional bloat in the requirements.

How? We would just load another datatable - we need to requirements for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I just mean that if we start to have tools that share the table but can't use all of them. Just referring to the tool label you mentioned at the top of the thread.
And I'd meant requirements for the data manager itself. In this case, you'd need the checkm2 conda package on top of the diamond package, as opposed to just the diamond package in the diamond_build_db data manager that already exists. If other tools operate similarly to checkm2 in the future, they'd need their conda package added to the data manager's xml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants